IUP Publications Online
Home About IUP Magazines Journals Books Archives
     
Recommend    |    Subscriber Services    |    Feedback    |     Subscribe Online
 
The IUP Journal of Computer Sciences :
A Comparative Analysis of Huffman Coding with Uniform Coding
:
:
:
:
:
:
:
:
:
 
 
 
 
 
 
 

In this paper, we analyze the compression ratio achieved by Huffman coding with that of uniform coding. For this, we first study the skewness property of the Huffman coding tree. We demonstrate that this tree will be completely skewed when the sorted frequency distribution of characters satisfies certain prefix properties. We also establish that among all the frequency distributions f1, f2, …, fn of a set of n characters that satisfy the prefix property, the average code length is maximum when the frequency distribution is a Fibonacci sequence. Then we estimate the average code length of Huffman coding for this frequency distribution.

 
 
 

One of the most popular entropy encoding schemes for text data compression is Huffman coding. Huffman code is an optimal unique prefix code. In other words, given a set characters for which the frequency of occurrence of each character is known in advance, the maximum possible compression is achieved by Huffman coding. Also, no character code is a proper prefix of another character code. This makes the code unambiguous and uniquely decodable. Unlike uniform coding where we encode each of the n characters in the character set with equal size code length of [log n] bits, Huffman coding is a variable length coding. For more detailed information relating to Huffman coding refer to Cormen et al. (2006).

In this paper, we analyze the compression ratio achieved by Huffman coding with that of uniform coding. For this, we first study the skewness property of the Huffman coding tree. We demonstrate that this tree will be completely skewed when the sorted frequency distribution of characters satisfies certain prefix properties. We also establish that among all the frequency distributions f1, f2, …, fn of a set of n characters that satisfy the prefix property, the average code length is maximum when the frequency distribution is a Fibonacci sequence. Then we estimate the average code length of Huffman coding for this frequency distribution.

The overall organization of the paper is as follows: in Section 2, we study the frequency distribution of characters for which the Huffman coding tree will be completely skewed. For this, we introduce the notion of prefix property of a sorted sequence. We also establish that for Fibonacci sequence, which also satisfies this prefix property, the average code length will be the maximum. In Section 3, we compute the average code length of Huffman code when the frequency distribution follows the Fibonacci sequence. Lastly, in Section 4, we conclude by summarizing our contribution.

 
 
 

Computer Sciences Journal, Business Intelligence, Enterprise Systems, Enterprise Resource Planning, Customer Relationship Management, CRM, Business Operations Management, Business Process Mining, Finite State Machine, Transactional Information System, Genetic Algorithms, Decision Making Process, Data Mining Tools, Online Analysis Processing, OLAP, Artificial Intelligence.